Syntactic Complexity of Prefix-, Suffix-, Bifix-, 
and Factor-Free Regular Languages * 



Janusz Brzozowski^, Baiyu Li^, and Yuli Ye^ 

^ David R. Cheriton School of Computer Science, University of Waterloo 
Waterloo, ON, Canada N2L 3G1 
email: {brzozo, b51i}@uwaterloo . ca 
^ Department of Computer Science, University of Toronto 
Toronto, ON, Canada M5S 3G4 
email: y3ye@cs.toronto.edu 



Abstract. The syntactic complexity of a regular language is the cardi- 
nality of its syntactic semigroup. The syntactic complexity of a subclass 
of the class of regular languages is the maximal syntactic complexity of 
languages in that class, taken as a function of the state complexity n 
of these languages. We study the syntactic complexity of prefix-, suffix-, 
bifix-, and factor-free regular languages. We prove that n"~^ is a tight up- 
per bound for prefix-free regular languages. We present properties of the 
syntactic semigroups of suffix-, bifix-, and factor-free regular languages, 
conjecture tight upper bounds on their size to be (n — + (n — 2), 

(n - + {n- 2)"-^ + {n~ 3)2"-^ and (n - 1)"-^ + (n - 3)2""^ + 1, 

respectively, and exhibit languages with these syntactic complexities. 



keyword bifix-free, factor-free, finite automaton, monoid, prefix-free, regular 
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1 Introduction 

A language is prefix-free (respectively, suffix-free, factor-free) if it does not con- 
tain any pair of words such that one is a proper prefix (respectively, sufHx, 
factor) of the other. It is bifix-free if it is both prefix- and suffix-free. We refer 
to prefix-, suffix-, bifix-, and factor-free languages as free languages. Nontrivial 
prefix-, suffix-, bifix-, and factor-free languages are also known as prefix, suffix, 
bifix, and infix codes }1I22| . respectively and, have many applications in areas 
such as cryptography, data compression, and information processing. 

The state complexity of a regular language is the number of states in the min- 
imal deterministic finite automaton (DFA) recognizing that language. An equiv- 
alent notion is that of quotient complexity, which is the number of left quotients 
of the language. State complexity of regular operations has been studied quite 
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extensively: for surveys of this topic and lists of references we refer the reader 
to |2I24) . With regard to free regular languages, Han, Salomaa and Wood [TU] 
examined prefix-free regular languages, and Han and Salomaa jOj studied suffix- 
free regular languages. Bifix- and factor-free regular languages were studied by 
Brzozowski, Jiraskova, Li, and Smith [4]. 

The notion of quotient complexity can be derived from the Nerode right con- 
gruence [T7], while the My hill congruence [TB] leads to the syntactic semigroup 
of a language and to its syntactic complexity, which is the cardinality of the 
syntactic semigroup. It was pointed out in [5] that syntactic complexity can be 
very different for regular languages with the same quotient complexity. Thus, for 
a fixed n, languages with quotient complexity n may possibly be distinguished 
by their syntactic complexities. 

In contrast to state complexity, syntactic complexity has not received much 
attention. In 1970 Maslov [13] dealt with the problem of generators of the semi- 
group of all transformations in the setting of finite automata. In 2003-2004, 
Holzer and Konig [11], and independently, Krawetz, Lawrence and Shallit [13] 
studied the syntactic complexity of languages with unary and binary alphabets. 
In 2010 Brzozowski and Ye [5] examined the syntactic complexity of ideal and 
closed regular languages, and in 2011 Brzozowski and Li ^ studied the syntactic 
complexity of star-free languages. Here, we deal with the syntactic complexity of 
prefix-, suffix-, bifix-, and factor-free regular languages, and their complements. 

Basic definitions and facts are stated in Sections [5] and |3| In Section [3| we 
obtain a tight upper bound on the syntactic complexity of prefix-free regular 
languages. In Sections [SHil we study the syntactic complexity of suffix-, bifix-, 
and factor-free regular languages, respectively. We state conjectures about tight 
upper bounds for these classes, and exhibit languages in these classes that have 
large syntactic complexities. In Section [8| we show that the upper bounds on 
the quotient complexity of reversal of prefix-, suffix-, bifix-, and factor-free reg- 
ular languages can be met by our languages with largest syntactic complexities. 
Section ini concludes the paper. 

2 Transformations 

A transformation of a set Q is a mapping of Q into itself. In this paper we consider 
only transformations of finite sets, and we assume without loss of generality that 
(5 = {l,2,...,n}. Let i be a transformation of Q. If i G Q, then it is the image 
of i under t. If X is a subset of Q, then Xt — {it \ i G X}, and the restriction of 
t to X, denoted by t\x, is a mapping from X to Xt such that it\x — it for all 
i £ X. The composition of two transformations ti and t2 of Q is a transformation 
ti such that 0^2) = {iti)t2 for all i £ Q. We usually drop the composition 
operator "o" and write tit2 for short. An arbitrary transformation can be written 
in the form 




where ik — kt, 1 ^ k n, and ik G Q- The domain doni(i) of t is Q. The 
range rng{t) of Q under t is the set rng(i) = Qt. We also use the notation 
t — [ii, ?2j • ■ • , in] for the transformation t above. 

A permutation of Q is a mapping of Q onto itself. In other words, a permuta- 
tion TT of Q is a transformation where rng(7r) = Q. The identity transformation 
maps each element to itself, that is, it = i for z = 1, . . . , n. A transformation t 
contains a cycle of length k if there exist pairwise different elements ii, . . . ,ik 
such that iit = «2i*2i = "^3: ■ ■ ■ ^ik-it — H-i Sind i^t = ii. A cycle is denoted 
by (ii, 12, . • . , jfe). For i < j, a transposition is the cycle and is the 

identity. A singular transformation, denoted by (p , has it = j and ht — h for 

all h ^ i, and (^) is the identity. A constant transformation, denoted by C^), has 
it = j for all i. 

The set of all transformations of a set Q, denoted by 7q, is a finite monoid. 
The set of all permutations of Q is a group, denoted by 6q and called the 
symmetric group of degree n. It was shown in |12I19| that two generators are 
sufficient to generate the symmetric group of degree n. In 1935 Piccard [TH] 
proved that three transformations of Q are sufficient to generate the monoid 
7q. In the same year, Eilenberg showed that fewer than three generators are 
not possible, as reported by Sierpihski [23j. We refer the reader to the book of 
Ganyushkin and Mazorchuk [7] for a detailed discussion of finite transformation 
semigroups. The following are well-known facts about generators of 6q and Tq: 

Theorem 1 (Permutations, |12[19| ). The symmetric group G q of size n\ can 
be generated by any cyclic permutation of n elements together with any transpo- 
sition. In particular, @q can be generated by c— (1, 2, . . . , n) and t — (1, 2). 

Theorem 2 (Transformations, [18j). The complete transformation monoid 
Tq of size n" can be generated by any cyclic permutation of n elements together 
with a transposition and a "returning" transformation r = ("). In particular, 
Tq can be generated by c = (1,2,..., n), t = (1, 2) and r — (") . 

3 Quotient Complexity and Syntactic Complexity 

If is a non-empty finite alphabet, then S* is the free monoid generated by 
E, and is the free semigroup generated by S. A word is any element of S*, 
and the empty word is e. The length of a word w G S* is \w\. A language over 
E is any subset of E* . If w = uxv for some u,x,v G E* , then u is a prefix of 
w, V is a, suffix of w, and a; is a factor of w. Both u and v are also factors of w. 
A proper prefix (suffix, factor) of is a prefix (suffix, factor) of w other than w. 

The left quotient, or simply quotient, of a language L by a word w is the 
language L^, — {x ^ E* \ wx G L}. For any L C E*, the Nerode right congru- 
ence [17] ~L of L is defined as follows: 

X y if and only ii xv ^ L ^ yv ^ L, for all v G E* . 

Clearly, = Ly if and only if a; ~l y. Thus each equivalence class of this right 
congruence corresponds to a distinct quotient of L. 



The Myhill congruence [16 of L is defined as follows: 

2; ~L y if and only if uxv E L <^ uyv £ L for all u,v G U* . 

This congruence is also known as the syntactic congruence of L. The quotient 
set S^/ K,L of equivalence classes of the relation «l is a semigroup called 
the syntactic semigroup of L, and S* / ~l is the syntactic monoid of L. The 
syntactic complexity (t(L) of L is the cardinality of its syntactic semigroup. 
The monoid complexity fJ,{L) of L is the cardinality of its syntactic monoid. If 
the equivalence class containing e is a singleton in the syntactic monoid, then 
(t(L) = ^{L) — 1; otherwise, <t{L) = fJ,{L). 

A deterministic finite automaton (DFA) is a quintuple A — {Q^ S,d,qi^ F), 
where Q is a finite, non-empty set of states, is a finite non-empty alphabet, 
6 : Q X S ^ Q is the transition function, gi g Q is the initial state, and F G Q 
is the set of accepting states. We extend 6 to Q x S* in the usual way. The DFA 
A accepts a word w G U* ii S{qi,w) G F. The set of all words accepted by A is 
L{A). By the language of a state g of ^ we mean the language accepted by the 
DFA (Q, E, S, q, F). A state is empty if its language is empty. 

Let i be a regular language. The quotient DFA of L is ^ = {Q, S, S, qi, F), 
where Q = {L„ | w S S*}, 5{Lw,a) = L^a, qi = ^ L, F ^ {L^ | e e 
The number k{L) of distinct quotients of L is the quotient complexity of L. The 
quotient DFA of L is the minimal DFA accepting L, and so quotient complexity 
is the same as state complexity, but there are advantages to using quotients [2] . 

In terms of automata, each equivalence class [w] of ~i, is the set of all 
words w that take the automaton to the same state from the initial state, and 
each equivalence class [w] -^^ of «l is the set of all words that perform the 
same transformation on the set of states |15j . In terms of quotients, [w] is 
the set of words w that can be followed by the same quotient ■ 

Let A = {Q,S,5,qi,F) be a DFA. For each word w £ S* , the transition 
function for w defines a transformation tw of Q by the word w: for all i G Q, 

def 

itw = S{i, w). The set of all such transformations by non-empty words forms 
a subsemigroup of 7q, called the transition semigroup of A [10] ■ Conversely, we 
can use a set {ta \ a G E} of transformations to define S, and so the DFA A. 
When the context is clear we simply write a — t, where t is a transformation of 
Q, to mean that the transformation performed hy a £ S is t. 

If A is the quotient DFA of L, then T4 is isomorphic to the syntactic semi- 
group Tl of L [TS], and we represent elements of Tl by transformations in T4. 

We attempt to obtain tight upper bounds on the syntactic complexity a{L) — 
\Tl\ of L as a function of the quotient complexity k{L) of L. First we consider 
the syntactic complexity of regular languages over a unary alphabet, where the 
concepts prefix-, suffix-, bifix-, and factor- free, coincide. So we may consider only 
unary prefix-free regular languages L with quotient complexity k{L) = n. When 
n = 1, the only prefix-free language is i = with a{L) — 1. For n ^ 2, a prefix- 
free language L must be a singleton, L — {a"^^}. The syntactic semigroup Tl 
of L consists of n — 1 transformations t^ by words w = a', where 1 ^ i ^ n — 1. 
Thus we have 



Proposition 1 (Unary Free Regular Languages). If L is a unary free reg- 
ular language with k(L) = n ^ 2, then (t{L) = n — 1. 

The tight upper bound for regular unary languages [11] is n. 

We assume that |Z'| ^ 2 in the following sections. Since the syntactic semi- 
group of a language is the same as that of its complement, we deal only with 
prefix-, suffix-, bifix-, and factor-free languages. All the syntactic complexity 
results, however, apply also to the complements of these languages. 

4 Prefix-Free Regular Languages 

To simplify notation we write e for the language {e}. Recall that a regular 
language L is prefix-free if and only it has exactly one accepting quotient, and 
that quotient is e TO]. 

Theorem 3 (Prefix- Free Regular Languages). If L is regular and prefix- 
free with k{L) = n ^ 2, then (t(L) ^ n"^^. Moreover, this hound is tight for 
n^2if\S\^ 1, forn = 3 if \S\ > 2, /or n = 4 if \S\ ^ 4, and for n ^ 5 if 

Proof. If L is prefix-free, the only accepting quotient of L is e. Thus L also has 
the empty quotient, since Ea = for a G S. Let A = {Q, S,6, 1, {n — 1}) be 
the quotient DFA of L, where, without loss of generality, n — 1 £ Q is the only 
accepting state, and n € Q is the empty state. For any transformation t G Tj^, 
(n — l)t — nt — n. Thus we have cr(L) ^ ri"~^. 

The only prefix-free regular language for n = I is i = with a{L) = 1; 
here the bound n"^^ does not apply. For n — 2 and S — {a}, the language 
L = e meets the bound. For n = 3 and S — {a, &}, L = b*a meets the bound. 
For n > 4, let An = ({1, 2,. . . ,n}, {a, b, c,di, d2,. . . ,dn-2},S, I, {n - I}), where 
a = (I, 2, . . . , n - 2), = (I, 2), c = {-J) ("7^) , and = {-J) („1 J 
for i = I, 2, . . . , n — 2. DFA As is shown in Fig. [1] where F = {di,d2, . . . , dn~2}- 
For n = 4, input a coincides with 6; hence only 4 inputs are needed. 



a, c 




Fig. 1. Quotient DFA ^6 of prefix-free regular language with 1,296 transformations. 



Any transformation t ^ Tl has the form 



t = 



( 



1 2 ••• 



n — 2 n — I n 



n 



n 



) 



where G {1, 2, . . . , n} for 1 ^ fc ^ n — 2. There are three cases: 

1. If ifc ^ n — 2 for all fc, 1 ^ fc ^ rt — 2, then by Theorem[2l An can do t. 

2. If ife ^ n — 1 for all fc, 1 ^ fc ^ n — 2, and there exists some h such that 
ih = n — 1, then there exists some j,! ^ j — 2 such that ik 7^ J for all k, 
1 ^ fc ^ n — 2. For all 1 ^ fc ^ n — 2, define i'^ as follows: i'f. — j \l ik — n~\^ 
and = ife if ^ n — 1. Let 



By Case 1 above, An can do s. Since t = sdj, An can do i as well. 

3. Otherwise, there exists some h such that ih = n. Then there exists some j, 
^ ^ j ^ n — such that ik j for all fc, 1 ^ k ^ n — 2. For all 1 ^ k ^ n~2, 
define i'j, as follows: i'f. = n — 1 if ik = n, i'j^ = j if ik — n — 1, and i'^. = ik 
otherwise. Let s be as above but with new i^. By Case 2 above. An can do 
s. Since t = sdj, An can do t as well. 

Therefore, the syntactic complexity of An meets the desired bound. □ 

We conjecture that the alphabet sizes cannot be reduced. As shown in Ta- 
ble m on p. I26[ we have verified this conjecture for n ^ 5 by enumerating all 
prefix- free regular languages with n ^ 5 using GAP j8]. 

5 SufRx-Free Regular Languages 

For any regular language L, a quotient L^, is uniquely reachable [5] if — 
implies that w = x.lt is known from [3] that, if L is a suffix-free regular language, 
then L = Lg is uniquely reachable by e, and L has the empty quotient. Without 
loss of generality, we assume that 1 is the initial state, and n is the empty state. 
We will show that the cardinality of Bgf (n), defined below, is an upper bound (B 
for "bound" ) on the syntactic complexity of suffix- free regular languages with 
quotient complexity n. Let 

Bsf (n) — {t E Tq \ I ^ rng(t), nt — n, and for all j ^ 1, 



Proposition 2. If L is a regular language with quotient DFA An — (Q, S, 1, F) 
and syntactic semigroup Tl, then the following hold: 




W = n or W ^ it^ Vi,! <i < 



1. If L is suffix- free, then is a subset o/Bsf(n). 

2. If L has the empty quotient, only one accepting quotient, and Tl C Bsf(n), 
then L is suffix-free. 

Proof. 1. Let L be sufSx-free, and let An be its quotient DFA. Consider an 
arbitrary t ^T^. Since the quotient L is uniquely reachable, it 1 for alH G Q. 
Since the quotient corresponding to state n is empty, nt = n. Since L is sufEx- 
free, for any two quotients Lw and Luw, where u,v,w G , w = v-' for some 
j ^ 1, and Lu, ^ 0, we must have n L^w = 0, and so L^, ^ Luw This means 
that, for any t gTl and j ^ 1, if IP ^ n, then IP ^ iP for alH, 1 < i < n. So 
t G Bsf(n), and Tl C Bsf(n). 

2. Assume that C Bgf (n), and let / be the only accepting state. If L is not 
suffix- free, then there exist non-empty words u and v such that v,uv G L. Let t„ 
and t^ be the transformations by u and w, and let i = then i 7^ 1. Assume 
without loss the generality that n is the empty state. Then f ^ n, and we have 
^tv = f = = Uutv = itv, which contradicts the fact that ty G Bsf(n). 
Therefore L is suffix-free. □ 

Let bsf(n) = |Bsf(n)|. We now prove that bsf(n) is an upper bound on the 

syntactic complexity of suffix-free regular languages. 

With each transformation t of Q, we associate a directed graph Gf, where Q 
is the set of nodes, and {i,j) G Q x Q is a directed edge from i to j if it = j. We 
call such a graph Gt the transition graph of t. For each node i, there is exactly 
one edge leaving i in Gf . Consider the infinite sequence i, it, it^, ... for any i € Q. 
Since Q is finite, there exists least j ^ such that iP'^^ = iP for some j' < j. 
Then the finite sequence St{i) = i,it, . . . , iP contains all the distinct elements 
of the above infinite sequence, and it induces a directed path Pt{i) from i to iP 
in Gf. In particular, if n G 5t(l), and nt ~ n, then we call 5t(l) the principal 
sequence of t, and Pt{^), the principal path of Gf. 

Proposition 3. There exists a principal sequence for every transformation t 
in Bsf(n). 

Proof. Suppose t G Bgf (n) andSf (1) = l,lt, . . . ,lt^ . If t does not have a principal 
sequence, then n ^ and IP'^^ = W ^ n for some j' < j. Let i = 1P^^~^ ; 

then i / 1 and IP = iP , violating the last property of Bgf (n). Therefore there 
is a principal sequence for every t G Bgf (n). □ 

Fix a transformation t G Bsf(n). Let i E Q he such that i ^ St(l). If the 
sequence Sf (i) does not contain any element of the principal sequence St(l) other 
than n, then we say that St{i) has no principal connection. Otherwise, there exists 
least j ^ 1 such that IP ^ n and IfJ = iP G St{i) for some / ^ 1, and we say 
that St{i) has a principal connection at IP . If j' < j, the principal connection is 
short; otherwise, it is long. 

Lemma 1. For all t G Bsf(n) and i ^ St(l)> the sequence St{i) has no long 
principal connection. 



Proof. Let t be any transformation in Bsf(n). Suppose for some i ^ St{l), the 
sequence St(i) has a fong principal connection at IP — iP ^ n, where j < j'. 
Hence iP ^ n, and IP = {iP ~-')P , which is a contradiction. Therefore, for 
all i ^ St{l), St{i) has no long principal connection. □ 

To calculate the cardinality of Bgf (n), we need the following observation. 

Lemma 2. For all t G Bsf(n) and i ^St{l), if St{i) has a principal connection, 
then there is no cycle incident to the path Pt{i) in the transition graph Gt- 

Proof. This observation can be derived from Theorem 1.2.9 of [7 . However, our 
proof is shorter. Pick any i ^ St(l) such that St(i) has a principal connection 
at IP ~ iP for some i,j and j' . Then the sequence St{i) contains n, and the 
path Pt{i) does not contain any cycle. Suppose C is a cycle which includes node 
X = it^ & Pt{i). Since there is only one outgoing edge for each node in Gt, the 
cycle C must be oriented and must contain a node x' ^ Pt{i) such that (x', x) is 
an edge in C . Then the next node in the cycle must be since there is only 

one outgoing edge from x. But then x' can never be reached from Pt{i), and so 
no such cycle can exist. □ 

By Lemma [21 for any IP e St(l), where j ^ 1, the union of directed paths 
from various nodes i to IP, if i ^ ^((l) and St(i) has a principal connection 
at It^ , forms a labeled tree Tt{j) rooted at iP . Suppose there are rj + 1 nodes 
in Tt{j) for each j, and suppose there are r elements of Q that are not in the 
principal sequence St(l) nor in any tree Tt{j), for some rj,r ^ 0. Note that, iP 
is the only node in Tt{j) that is also in the principal sequence Si(l). Each tree 
Tt{j) has height at most j — I; otherwise, some i G 7t(j) has a long principal 
connection. In particular, tree Tt{l) has height 1; so it is trivial with only one 
node It. Then ri — 0, and we need only consider trees Tt{j) for j ^ 2. Let 
Sm{h) be the number of labeled rooted trees with m nodes and height at most 
h. This number can be found in the paper of Riordan |21) : the calculation is 
somewhat complex, and we refer the reader to [21] for details. For convenience, 
we include the values of Sm{h) for small values of m and h in Table [l] where 
the row number is h and the column number is m. 

Table 1. The number Sm{h) of labeled rooted trees with m nodes and height at most h. 
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112609 


6 
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7776 


117649 



Since each of the m nodes can be the root, there are S'„^{h) = '^'^^^ labeled 
trees rooted at a fixed node and having m nodes and height at most h. The 
following is an example of trees Tt{j) in transformations t £ Bgf (n). 

Example 1. Let n = 15. Consider any transformation t G Bgf (15) with principal 
sequence St(l) = 1, 2, 3, 4, 5, 15. There are 9 elements of Q that are not in St(l), 
and some of them are in the trees Tt{i) for 2 ^ j ^ 4. Consider the cases where 
r2 = 2, = 3, r4 = 1, and r ~ Z. Fig. [5]shows one such transformation t. 




Fig. 2. Transition graph of some t £ Bsf (15) with principal sequence 1, 2, 3, 4, 5, 15. 

For j = 2, the tree Tf(2) has height at most 1, and there are 'S'^2+i(l) = 

^i2±ip. = 1 = 1 possible Tt{2). For j = 3, there are 5^3+1(2) = ^ 
possible Tt(3), which are of one of the three types shown in Fig. [3] Among the 
10 possible 7t(3), one is of type (a), three are of type (b), and six are of type (c). 
For i = 4, there are S';.,+i(3) = ^X+i^ = ^ possible Tt(A). 




(a) (b) (c) 



Fig. 3. Three types of trees of the form Tt(3), where {ii, 12, 23} = {8, 9, 10}. 



Let C^' be the binomial coefficient, and let CJi_^ be the multinomial 

coefficient. Then we have 



Lemma 3. For n ^ 3, we have 

11-2 k 
fc=0 '•2H hr/t+r j=2 

Proof. Let t be any transformation in Bsf (n). Suppose Sf (1) — 1, li, . . . , li'^, n 
for some fc, ^ fc ^ n — 2. There are CJ!~^kl different principal sequences 
Now, fix St(l). Suppose n — k — 2 — r2 + -- - + rk+r, where, for 2 < j < fc, 
tree Tt{j) contains rj + 1 nodes, for some rj > 0. There are C"^"'^.^'^^^ different 
tuples (r2, . . . , rfc, r). Each tree Tt{j) has height at most — 1, and it is rooted 

at W. There are S';^.+i(j - 1) "^"^^"j-^"^^ different trees rt(j). Let E be the 
set of the remaining r elements x oi Q that are not in any tree Tt{j) nor in the 
principal sequence St(l)- The image xt can only be chosen from E U {n}. There 
are {r + iy different mappings of E. Altogether we have the desired formula. □ 

From Proposition [2] and Lemma [3] we have 

Proposition 4. For n ^ 3, if L is a suffix-free regular language with quotient 
complexity n, then its syntactic complexity cr{L) satisfies that a{L) ^ bsf(n), 
where bsf(n) is the cardinality o/Bst(n), and it is given by Equation (Qp. 

Note that Bgf (n) is not a semigroup for n > 4 because si = [2, 3, n, . . . , n, n], 
S2 = [n, 3, 3, . . . , 3, n] 6 Bsf(n), but siS2 = [3,3,n, . . . ,n,n] ^ Bsf(n). Hence, 
although hsf{n) is an upper bound on the syntactic complexity of suffix- free 
regular languages, that bound is not tight. Our objective is to find the largest 
subset of Bsf (n) that is a semigroup. Let 

W|^(n) ^{te Bsi{n) I for ah i,j eQ where i ^ j, 

we have it — jt — n or it ^ jt}, 

where W stands for "witness" . 

Proposition 5. For n ^ 3, W^^(n) is a semigroup contained in Bst(n), and 
its cardinality is 

n-l 

k=l 

Proof. We know that any t is in W^^(n) if and only if the following hold: 

1. it 1 for all i E Q, and nt — n; 

2. for all i, f G Q, such that i ^ j, either it = jt = n or it ^ jt. 

Clearly W^^(ri) C Bsf(n). For any transformations ^1,^2 £ W^^(n), con- 
sider the composition tit2. Since 1 ^ rng(i2), we have 1 ^ ing{tit2). We also 
have 71^1^2 = nt2 = n. Pick any i,jEQ such that i 7^ j. Suppose itit2 7^ n or 



jtit2 ^ n. If itit2 — jtit2, then iti = jti and thus i — j, a, contradiction. Hence 

is a semigroup contained in Bsf(n). 
Let t e Wf{n) be any transformation. Note that nt ^ n is fixed. Let 
Q' = Q \ {n}, and Q" = Q\{l,n}. Suppose k elements in Q' are mapped to n 
by t, where ^ fc ^ n — 1; then there are C^~^ choices of these elements. For the 
set D of the remaining n~l — k elements, which must be mapped by t to pairwise 
distinct elements of Q" , there are C^^Zi^ki^ — 1 — fc)! choices for the mapping 
t\D- When fc = 0, there is no such t since \Dt\ = n— 1 > n — 2 ~ \Q"\- Altogether, 
the cardinality of Wf (n) is |Wf (n)| = Z^'l C^'in - 1 - fc)!Cr^,. □ 

We now construct a generating set G^^(ri) (G for "generators") of size n 
for W^^(n), which will show that there exist DFA's accepting suffix- free regular 
languages with quotient complexity n and syntactic complexity w^^(n). 

Proposition 6. When n ^ 3, the semigroup W^^(7i) is generated by the fol- 
lowing set G^^(n) of transformations ofQ: G^^(3) — {a, &}, where a — [3,2,3] 
and b = [2,3,3]; Gf^{A) = {a,6,c}, where a = [4,3,2,4], b = [2,4,3,4], 
c — [2, 3, 4, 4]; and for n ^ 5, G^^(n) = {oo, . . . , a„_i}, where 

«o = 0(2,3), 

- ai = Q(2,3,...,n-1), 

— For 2 ^ i ^ n — 1, joi = j + 1 for j = I, . . . ,i — 1, ioi ^ n, and joi — j for 
J = i + 1, . . . , n. 

Proof. First note that G^^(ri) is a subset of W^^(n), and so (G^'^(n)), the 
semigroup generated by G^^(n), is a subset of W^^(n). We now show that 
Wf{n) C (Gf (n)). 

Pick any t in W^^(n). Note that nt — n is fixed. Let Q' = Q \ {n}, Et = 
{j e Q' I jt = n}, Dt ^ Q'\ Et, and Q" = Q \ Then Dtt C g", and 

\Et\ > 1, since \Q"\ < \Q'\. We prove by induction on \Et\ that t e (Gff^(n)). 

First, note that (ao,ai), the semigroup generated by {ao,ai}, is isomorphic 
to the symmetric group &q" by Theorem[TJ Consider Et = {i} for some i G Q'. 
Then iui = it = n. Moreover, since DtOi^Dtt C Q", there exists tt G (ao,ai) 
such that {jai)^ = jt for all j S i?f Then t = aiW £ (G^^(n)). 

Assume that any transformation t G W^^(n) with \Et \ < k can be generated 
by Gff^{n), where 1 < fc < n — 1. Consider t G W^^(n) with \Et \ = k. Suppose 
Et = {ei, . . . , efc_i, Gfc}. Let s G W^^(n) be such that i?s = {ei, . . . , efe_i}. 
By assumption, s can be generated by G^^(n). Let i = Cks; then i G Q", and 
ej{sai) = n for all 1 ^ j ^ fc. Moreover, we have Dt{sai) C Q". Thus, there exists 
TT G (ao,ai) such that, for all d G I?f, d{sain) = dt. Altogether, for all Cj G E't, 
we have ej^saii:) — ejt — n, for all d G Dt, (i(sa,;7r) = dt, and n[saiTT) = nt = n. 
Thus t = sfliTT, and t G (G^^(n)). 

Therefore Wf/ (n) = {Gf^^ (n)). □ 



Theorem 4. For n ^ 5, let An ~ {Q, ^, S, 1, F) be the DFA with alphabet U = 
{oq, ai, . . . , a„_i}, where each ai defines a transformation as in Proposition\^ 
and F = {2}. Then L — L{An) has quotient complexity k{L) — n, and syntactic 
complexity cr(L) — w^^(n). Moreover, L is suffix-free. 

Proof. First we show that aU the states of An are reachable: 1 is the initial 
state, state n is reached by ai, and for 2 < i < n — 1, state i is reached by 
a*~^. Also, the initial state 1 accepts 02 while state i rejects 02 for all i ^ 1. For 
2 < I < n — 1, state i accepts a"~*, while state j rejects it, for all j ^ i. Also n 
is the empty state. Thus all the states of An are distinct, and k,{L) = n. 

By Proposition ini the syntactic semigroup of L is W^^(n). The syntactic 
complexity of L is cr{L) = |W^''^(n)| = w^^(n). Also, by Proposition [21 L is 
suffix-free. □ 

As shown in Table [2] on p. [26l the size of S cannot be decreased for n ^ 5. 

Theorem 5. For 2 ^ n ^ 5, i/ a suffix- free regular language L has quotient 
complexity k{L) — n, then its syntactic complexity satisfies that o'{L) ^ w^^(n), 
and this is a tight upper bound. 

Proof. By Proposition[2l the syntactic semigroup of a suffix- free regular language 
L is contained in Bst(n). For n £ {2,3}, w^^(ri) = bsf(n). So w^^(n) is an 
upper bound, and it is met by the language L = e for n = 2 and by L = 
ah* for n = 3. For n 4, we have |Bsf(4)| = 15 and |W|^(4)| = 13. Two 
transformations, si = [4,2,2,4] and S2 — [4,3,3,4], in Bsf(4) are such that si 
conflicts with ti = [3,2,4,4] e W|^(4) {tisi = [2,2,4,4] ^ Bsf(4)), and S2 
conflicts with ^2 = [2,3,4,4] (^252 = [3,3,4,4] ^ Brf(4)). Thus a{L) ^ 13. Let 
L = {bU c)((a U c)b*a)*; then k{L) = 4 and a{L) 13. So the bound is tight. 

For n = 5, we have JB^f (5)| = 115 and 1W|^(5)] = 73. Let B^f (5)\W|^(5) ^ 
{si, . . . , S42}. For each s^, we enumerated transformations in W^^(5) using GAP 
and found a unique ti £ W^^(5) such that the semigroup {ti, Si) is not contained 
in Bgf (5). Thus at most one transformation in each pair {ti, Si} can appear in the 
syntactic semigroup of L. So we reduce the upper bound to 73. By Theorem 21 
this bound is tight. 

For n ^ 6, the semigroup W^"'(n) is no longer the largest semigroup con- 
tained in Bgf (n). In the following, we define and study another semigroup W^^(n), 
which is a larger semigroup contained in Bsf(n). Let 

W|^(n) ^ {te Bsf (n) \ It ^ n or it V i, 2 ^ i < n - 1}. 

Note that, we are interested only in situations where n ^ 6, although some 
statements also hold for smaller n. 

Proposition 7. For n ^ 6, the set W^^(n) is a semigroup contained in Bgf (n), 
and its cardinality is 

W = |Wf (n)] ^ (n - 1)"-^ + (n 2). 



Proof. Pick any ti,t2 in W^^(ri). If Iti = n, then l(iit2) = ?^ and tit2 € 
W^^(n). If Iti 7^ n, then, for ah i G {2, . . . , n — 1}, iti = n and i{tit2) = n; so 
tii2 £ W^^(n) as weh. Hence W^®(n) is a semigroup contained in Bgf (ri). 
For any t G W^^(ri), = n is fixed. There are two possible cases: 

1. It = n: For each i 6 {2, . . . , n — 1}, it can be chosen from {2, . . . , n}. Then 
there are (n — different i's in this case. 

2. li 7^ n: Now It can be chosen from {2, . . . , n — 1}. For each i e {2,...,n — 1}, 
it = n is fixed. There are n — 2 different fs in this case. 

Therefore wff^(n) = {n - 1)"-^ + - 2). □ 

Proposition 8. For n ^ 6, the semigroup W^^(ri) is generated by the set 
^^^(^*) ~ {'^1' '^s, 03, bi, . . . , bn-2, c} of transformations, where 

1. a, = ej(2,...,n- 1), 02 = 0(2,3), as = CJlV)/ 
5- = [2, n]. 

Proof. Clearly Gff^(n) C Wff^(n), and {Gf^^{n)) C Wff^(n). We show in the 
following that W^^.^(n) C (Gff^(n)}. 

Let Q' = {2, . . . , n — 1}. By Theorem[21 ai, 02 and 03 together generate the 
semigroup 

Y = {t e Wff^(n) I for ah i e Q' ,it e g'}, 

which is isomorphic to Tq' and is contained in W^^(n). Next, consider any 
t e W^®(n) \ Y. We have two cases: 

1. It = n: Let Et ^ {i ^ Q' \ it ^ n]. Since t ^ Y, i;* 7^ 0. Suppose 
Et = {«!, . . . for some 1 ^ fc ^ 71 — 2. Then there exists G Y such 
that, for all i ^ Et, it' — it. Let s = fei^-i • • - bi^-i. Note that £'fS = {n}, 
and, for ah i ^ i{t' s) = {it')s = it. So t = t's e (G|^(n)). 

2. It 7^ n: If It = 2, then t — c. Otherwise, It S {3, . . . , n — 1} C Q' , and we 
know from the above case that there exists t' e G^®(n) such that 2t' — It. 
Then l(ct') = It, and i(ct') = (ic)t' ^ n ^ it, for ah i G Q'. Hence t = 
ct' e (G,f (n)). 

Therefore (ai, 02, 03, &i, . . . , &„_2,c} = W^^(n). □ 

Theorem 6. For n^6, let A'„ = (Q, i:, 5, 1, F) be the DFA with alphabet S = 
{ai, a2, 03, 61, ... , bn^2, c} of size n+2, where each letter defines a transformation 
as in Proposition\^ and F = {2}. Then L' = L{An) has quotient complexity 
k{L') — n and syntactic complexity cr(i') = w^^(n). 

Proof. First we show that k{L') — n. From the initial state, we can reach state 
2 by c and state n by ai. From state 2 we can reach state i, 3 ^ i ^ n — 1, by 
a^~^. So all the states in Q are reachable. Now, the initial state accepts c, but 



all other states reject it. For 2 ^ i ^ n — 2, state i accepts a"~\ while all other 
states reject it. State n is the empty state, which rejects all words. Thus all the 
states in Q are distinct. 

By Proposition [51 the syntactic semigroup of L' is W^®(n), and cr(L') — 
wff^{n). Also L' is sufRx-free by Proposition [21 □ 

We know that the upper bound on the syntactic complexity of suffix-free 
regular languages is achieved by the largest semigroup contained in Bgf (n). We 
conjecture that W^^(n) is such a semigroup. 



Conjecture 1 (Suffix-Free Regular Languages). If L is a suffix- free regular lan- 
guage with k{L) — then (t{L) ^ w^ (n) and this is a tight bound. 

We prove the conjecture for n = 6: 

Proof For n = 6, |Bsf(6)| = 1169 and |Wf/(6)| = 629. Let {si, . . . , S540} = 
Bsf(6) \ W^^(6). For each i, we enumerated transformations in W^^(6) using 
GAP and found a unique ti £ W^^(6) such that {ti,Si) is not contained in 
Bsf(6). As in the proof of Theorem [SI for each i, at most one transformation 
in {ti, Si} can appear in the syntactic semigroup of L. Then we can reduce the 
upper bound to 629. This bound is met by the language L' in Theorem [HI so it 
is tight. □ 



6 Bifix-Free Regular Languages 

Let L be a regular bifix-free language with k{L) ~ n. From Sections [4l and [H we 
have: 

1. L has £ as a quotient, and this is the only accepting quotient; 

2. L has as a quotient; 

3. L as a quotient is uniquely reachable. 

Let A be the quotient DFA of L, with Q as the set of states. We assume that 
1 is the initial state, n — 1 corresponds to the quotient e, and n is the empty 
state. Consider the set 

Bbf(n) = {te B,f (n) I {n - l)t = n}. 

The following is an observation similar to Proposition [21 

Proposition 9. If L is a regular language with quotient complexity n and syn- 
tactic semigroup T^, then the following hold: 

1. If L is bifix-free, then Tl is a subset o/Bbf(n). 

2. If e is the only accepting quotient of L, and C Bbf (n), then L is bifix-free. 



Proof. 1. Since L is suffix- free, Tl C Bsf(n). Since L is also prefix- free, it has 
e and as quotients. By assumption, n — 1 G Q corresponds to the quotient e. 
Thus for any t G Tl, [n — \)t — n, and so Tl C Bbf(7T.). 

2. Since e is the only accepting quotient of L, L is prefix-free, and i has the 
empty quotient. Since Tl C Bbf(n) C Bsf(n), L is sufSx-free by Proposition [51 
Therefore L is bifix-free. □ 

Lemma 4. _For n ^ 3, we /laue |Bbf(7^)| = Af„ + A'^,i, where 

n-2 fc 

^« = E^rr(fc-i)! E a"-'.:.!,r(r- + irn^;+i(j-i)' (2) 

A:=l i'2H hrio+r j=2 

=ri-A;-2 

^" = E^r'fc! E c:~'^:rlA^ + ^rt[Sr,+iU-^)- (3) 

fc=0 r2H i-Tk+r j=2 

—n—k—3 

Proof. Let t be any transformation in Bbf (n). Suppose Si(l) = 1, li, . . . , It'', n, 
where < fc < n — 2. For 2 < j ^ fc, suppose tree Tt(j) contains rj + 1 nodes, for 
some rj > 0; then there are S'^.^i{j — 1) different trees Tt(j). Let E be the set of 
elements of Q that are not in any tree Tt{j) nor in the principal sequence St(l). 
Then there are two cases: 

1. n — 1 G St(l): Since (n — l)t = n, we must have It'' = n — 1, and fc ^ 1. So 

there are C^rf (fc-1)! different St(l). Let r = \E\ = (n-fc-2)-(r2H hr^). 

Then there are C"^"*^"^ ^ tuples (r2, . . . , r^, r). For any x d E, its image a;<: 
can be chosen from E U {n}. Then the number of transformations t in this 
case is M„. 

2. n - 1 ^ Sf(l): Then fc ^ n - 3, and there are C^"^fc! different Sf(l). Note 
that n — 1 E E, and (n — l)t = n is fixed. Let r = \E \ {n — 1}\ = [n — 
k — i) — {r2 + ■ ■ ■ + rk). Then there are C"^^'^."^^,^ tuples (r2, . . . , r^, r). For 
any x £ E \ {n — 1}, xt can be chosen from E U {n}. Thus the number of 
transformations t in this case is iV„. 

Altogether we have the desired formula. □ 
Let hhi{n) = |Bbf(7T.)|. From Proposition |9] and Lemma |4] we have 

Proposition 10. For n ^ S, if L is a bifix-free regular language with quotient 
complexity n, then its syntactic complexity cr(L) satisfies that cr{L) bbf(n), 
where bbf(rt) is the cardinality o/Bbf(?T.) as in Lemma^ 

For 2 ^ n ^ 4, the set Bbf(n) is a semigroup. But for n ^ 5, it is not 
a semigroup because si = [2, 3, n, . . . , n, n], S2 — [n,3,3,n, . . . ,n,n] G Bbf(n) 
while S1S2 = [3, 3, n, . . . , n, n] ^ Bbf (n). Hence bbf is not a tight upper bound 
on the syntactic complexity of bifix-free regular languages in general. We look 



for a large semigroup contained in Bbf(n) that can be the syntactic semigroup 
of a bifix-free regular language. Let 

W^f'^(n) — {t ^ Bbf(?i) I for all i,j^Q where i ^ j, 

we have it — jt = n or it ^ jt}. 

(The reason for using the superscript ^ 5 will be made clear in Theorem [8l) 

Proposition 11. For n ^ 3, W^f^(n) is a semigroup contained in Bbf(n) with 
cardinality 

n-2 

N = |W,f (n)| = i^r'Tin -2-ky. 

Proof. First, note that W^/(n) W|^(n) n Bbf(n), and that W^'|^(n) is a 
semigroup contained in Bsf(n) by Proposition [5] For any ti,t2 £ W^j-^(n), we 
have ^1^2 G W^'^(n), and (n — l)tit2 — nt2 — n; so ^1^2 G Bbf(n). Then 
tit2 G W^;(n), and W^;(n) is a semigroup contained in Bbf(?i). 

Pick any t G W^f^(n). Note that {n — l)t = n and nt = n are fixed, and 
1 rng(i). Let Q' ^ Q\{n - l,n], E ^ {i ^ Q' \ it ^ n}, and D ^ Q'\E. 
Suppose \E\ = k, where < fc < n — 2; then there are C^"^ choices of E. 
Elements of D are mapped to pairwise different elements of Q \ { 1 , n} ; then there 
are C^Z^-ki^ — 2 — k)\ different mappings t\D- Altogether, we have |W^f^(n)| = 

Yrk-=i{cr')\n-2-k)\ □ 

Proposition 12. For n ^ 3, letQ' = Q\{n- 1, n} and Q" = Q\ {1, n}. Then 
the semigroup W^j-^(n) is generated by 

G^^{n) = {t G W^f^(n) I Q't — Q" and it ^ jt for all i,j G Q'}. 

Proof We want to show that W^/(n) = (G^/(n)}. Since G^f^(n) C W^/(n), 
we have {G^{{n)) C W^j^(n). Let i G W^j^(n). By definition, (n — f)t ~ nt = n. 
Let i^t = {i G Q' I = n}. If Et = 0, then t G G^^ln); otherwise, there exists 
X G Q" such that x rng(<). We prove by induction on \Et\ that t G (G^f^(n)). 

First note that, for all t G G^j^(n), tig/ is an injective mapping from Q' to 
Q". Consider Et = {i} for some i G Q'. Since \Et\ = 1, rng(t) U {x] = Q". Let 
ti , ^2 G G^j^ (n) be defined by 

1- jh = j + 1 for j = I, . . . , i - I, iii = n - I, jti j for j = i + 1, . . . , n - 2, 
2. 1<2 = a;, it2 = {j - l)t for j = 2, . . . , i, jt2 = for j = i + 1, . . . , n - 2. 

Then tit2 = t, and t G (G^f^(n)). 



Assume that any transformation t e W^f^(rt) with \Et \ < k can be generated 
by G^f^(n), where 1 < fc < n — 2. Consider t G W^f.'^(n) with \Et \ = k. Suppose 
Et = {ei, . . . , efc_i, efe}, and let A = Q'\Et = {di, . . . , d;}, where I = n-2-k. 
By assumption, all s G W^f^(n) with \Es\ = k ~ 1 can be generated by G^f{n). 
Let s be such that -E^ — {1, . . . , fc — 1}; then Is — ■ ■ ■ — {k — l)s = n. In addition, 
let ks = X, and let (fc + j)s = djt for j ~ 1, . . . ,1. Let G G^^{n) be such that 
e^t' = j for j = 1, . . . , fc — 1, fct' = n — 1, and djt' = fc + j for j = 1, . . . , /. Then 
t's = t, and t G (G^/(n)). Therefore, W^/(n) = (G^/(n)). □ 

Theorem 7. _For n ^ 3, ^et Ai = (Q, 5, 1, F) &e the DFA with alphabet S of 
size {n — 2)!, where each a G i7 defines a distinct transformation ta G G^f^(n), 
and F = {n — 1}. T/iert L = L(Ai) quotient complexity k{L) — n, and 
syntactic complexity a{L) = w^f^(n). Moreover, L is hifix-free. 

Proof. We first show that all the states of An are reachable. Note that there 
exists a € S such that ta = [2, 3, . . . , n — 1, n, n] G G^^{n). State 1 G Q is the 
initial state, and a*^^ reaches state i G Q for « = 2, . . . , ri. Furthermore, for 
1 < « < n — 1, state i accepts a"~^~-' , while for j ^ i, state j rejects it. Also, n 
is the empty state. Thus all the states of An are distinct, and k{L) = n. 

By Proposition [121 the syntactic semigroup of L is ^^^(n). Hence the syn- 
tactic complexity of L is a{L) = w^f.'^(n). By Proposition [SI L is bifix-free. □ 

Theorem 8. For 2 ^ n ^ 5, if a bifix-free regular language L has quotient 
complexity k{L) ~ n, then (j{L) ^ w^j.^(n), and this bound is tight. 

Proof. We know by Proposition [5| that the upper bound on the syntactic com- 
plexity of bifix-free regular languages is reached by the largest semigroup con- 
tained in Bbf(n). Since w^j^(n) = hhi{n) for n = 2, 3, and 4, w^j^(n) is an upper 
bound, and it is tight by Theorem [7) 

For n = 5, we have bbf(5) = |Bbf(5)l = 41, and w^/(5) = |W^/(5)| = 34. 
Let Bbf (5) \ W^f^(5) = {ti, . . . , ry}. We found for each n a unique ti G W^/(5) 
such that the semigroup {Ti,ti) is not a subset of Bbf (5): 



n = 


[2,4,4,5,5], 


ti = 


[3,4,2,5,5]; 


T2 = 


[3,4,4,5,5], 


t2 = 


[3,5,2,5,5]; 


T3 = 


[4,2,2,5,5], 


ts = 


[2,4,3,5,5]; 


T4 = 


[4,3,3,5,5], 


t4 = 


[2,5,3,5,5]; 


T5 = 


[5,2,2,5,5], 


^5 = 


[3,2,4,5,5]; 


T6 = 


[5,3,3,5,5], 


t6 = 


[2,3,4,5,5]; 


Tj = 


[5,4,4,5,5], 


t7 = 


[3,2,5,5,5]. 



Since {Ti,ti) C T^, if both and ti are in T^, then % Bbf (5), and L is 
not bifix-free by Proposition [HI Thus, for 1 ^ i ^ 7, at most one of and ti can 
appear in T^, and \Tl\ ^ 34. Since |W^j.^(5)| = 34 and W^f^(5) is a semigroup, 
we have cr(i) ^ 34 = w^f^(5) as the upper bound for n = 5. This bound is 
reached by the DFA A^ in Theorem [71 □ 



For n > 6, the semigroup W^f^(n) is no longer the largest semigroup con- 
tained in Bbf (n). We find another large semigroup Wy^(n) suitable for bifix-free 
regular languages. Let 

Vi = {t€ Bbf (n) I It = n}, 
U^ = {teBbf(n) |li = n-l}, 

= {i G Bbf (n) \lt^{n,n- 1}, and ite{n- 1, n} for ah i ^ 1}, 

and let W^/(n) = Uj, UU^ UU^. When 2 < n < 4, we have W^/(n) = W^/(n), 
and these cases were already discussed. So we are only interested in larger val- 
ues of n. 

Proposition 13. For n ^ 5, W^/(n) is a semigroup contained in Bbf (n) with 
cardinality 

Wbf (n) = \W^,'in)\ = (n - 1)"-^ + (n - 2)-' + (n - 3)2'-\ 

Proof. First we show that Ujj is a semigroup. For any ti,t'i G U^, since l(tiii) = 
(Iti)f'i = nt[ = n, we have tit[ e U^^. Next, let t2 G and t gVIu . If 
t e U^, then l(t2t) = (n - l)t = n and l(tt2) = rit2 = ri: so t2t,tt2 G U^^. If 
t G U^, then l(t2t) = (n - l)t = n and 1(^2) = (n - l)f2 = n; so t2t,tt2 G 
as well. Thus U is also a semigroup. For any t^ G Uf^ and t' G W^j®(n), 
since its G {n — 1, n} for all i ^ I, and (n — = nt' = n, wc have iit^t') = n, 
and tat' G W^j'^(n). Also l(t't3) = {lt')t3 G {n- l,n}, so t'ts G Uj^UU^. Hence 
W^/(n) is a semigroup contained in Bbf(n). 

Note that U^, U^, and are pairwise disjoint. For any t G W^f^(n), there 
are three cases: 

1. t G U^: For any i ^ {l,n — l,n}, it can be chosen from Q \ {!}. Then 

|Ui| = (n-l)"-3; 

2. t € U^: For any z {1, n — 1, n}, it can be chosen from Q\{l,n— 1}. Then 
|U2| = (n-2r-3; 

3. t G U^: Now, It can be chosen from Q\{l,n—1, n}. For any i ^ {1, n— 1, n}, 
it has two choices: it = n — 1 or n. Then |U^| = (n — 3)2"~^. 

Therefore we have |W^f^(n)| = (n - l)"-^ -|- (n - 2)"-3 + {n - 3)2"-3. □ 

The next proposition describes a generating set of Wy^(n). 

Proposition 14. Forn > 5, t/ie semigroup ^^f{n) is generated by Gy^(n) = 

{ai, 02, 03, 61, . . . , &„_3, ci, . . . , Cm, di, • . • , rf;}, where m = (n — 2)"~^ — 1 and 

Z = (n-3)(2"-3 _ j^)^ 

^- «i = Qr;^)(2,....n-2), 0.2 = 0rj){2,3), a3= 

5. iJac/i c, defines a distinct transformation in other than [n—1, n,. . .,n, n]; 



4- Each di defines a distinct transformation in other than [j, n, . . . , n, n] for 
all j e {2,...,n~2}. 

Proof. Since G^^{n) C W^j®(n), we have {G^^{n)) C W^j-^(n). It remains to 
be shown that W^f (n) C (G*f^(n)). Let Q' = g \ {1, n - 1, n}. 

1. First consider U^. By Theorem [2l ai,a2 and 03 together generate the semi- 
group 

Y' = {te Vi I for ah i e Q' , it e Q'}, 

which is contained in U^. For any t e \ Y', let Et ^ {i € Q \ it = n — 1}; 
then Et 7^ 0. Suppose Et = {ii, . . . , ik}, where 1 ^ fc ^ n — 3. Then there 
exists t' e Y' such that, for ah i ^ Et, it' — it. Let s — ■ • • Note 
that Ets = {n — 1}, and, for ah i ^ Et, i{t's) — {it')s — it. So t' s — t, and 
(ai, 02, 03, 61, ... , 6„-3) = U^. 

2. Next, the transformations that are in U2 u U3 but not in Gl^{n) are — 

[i,n,..., n, n], where 2 < i ^ n - 1. Note that d = (2) (";;^) e G^f^(n), 

and, for each i e {2, . . . , n - 1}, = (i) (2) £ U,i . Then t, = ds, G 
(G^f^(n)), and u C (G^/(n)). 

Therefore W^^^ (n) = (G^/ (n) ) . □ 

Theorem 9. For n^b, let A',, ^ (Q, 5, 1, F) be the DFA with alphabet S of 
size {n — 2)"^'^ + (n — 3)2"^"^ + 2, where each letter defines a transformation as 
in Proposition \14\ and F ~ {n — 1}. Then L' = L{A'„) has quotient complexity 
k{L') = n, and syntactic complexity (j{L') — w^^{n). Moreover, L' is bifix-free. 

Proof. First, for ah i g (5\{1}, there exists a £ E such that ta ~ [i,n, . . . , n, n] G 
G^f^(n), and state i is reachable by a. So all the states in Q are reachable. 
Next, there exist b,c £ S such that ti, — [n — l,n, . . . ,n,n] G G^^{n) and 
tc = [n, 3, 4, . . . , n, n] G G^^{n). The initial state accepts b, while all other 
states reject it. For 2 ^ i ^ n — 2, state i accepts while all other states 

reject it. Also, state n — 1 is the only accepting state, and state n is the empty 
state. Then all the states in Q are distinct, and k{L') = n. 

By Proposition [131 the syntactic semigroup of L' is Wjjj (n); so a{L') — 
w^f^(n). By Proposition |9l L' is bifix-free. □ 

Conjecture 2 (Bifix-Free Regular Languages). If L is a bifix-free regular language 
with k{L) = n 6, then a{L) ^ wj^j-^(n) and this is a tight bound. 

The conjecture holds for n = 6 as we now show: 
Proof When n = 6, |Bbf(6)| = 339 and |W^/(6)| = 213. There are 126 trans- 
formations Ti, . . . , ri26 in Bbf (6) \ W^f^(6). For each Ti, we enumerated trans- 
formations in W^f^(6) using GAP and found a unique ti G W^f^(6) such that 
{ti,Ti) 2 Bbf (6). Thus, for each i, at most one of ti and Ti can appear in the 
syntactic semigroup Tl of L. So we further lower the bound to ^{L) ^ 213. This 
bound is reached by the DFA Aq in Theorem [9] so it is a tight upper bound 
for n = 6. □ 



7 Factor-Free Regular Languages 



Let L be a factor-free regular language with k{L) = n. Since factor-free regular 
languages are also bifix-free, L as a quotient is uniquely reachable, e is the only 
accepting quotient of L, and L also has the empty quotient. As in Section [51 we 
assume that Q is the set of states of quotient DFA of L, in which 1 is the initial 
state, and states n — 1 and n correspond to the quotients e and 0, respectively. 
Let 

Bff(n) = {t G Bbf (ri) | for all j ^ 1, li-' = n - 1 =^ ii-' = n V i, 1 < i < n - 1}. 
We first have the following observation: 

Proposition 15. If L is a regular language with quotient complexity n and syn- 
tactic semigroup T^, then the following hold: 

1. If L is factor- free, then is a subset o/Bff(n). 

2. If e is the only accepting quotient of L, and Tl C Bff(n), then L is factor- 
free. 

Proof. 1. Assume L is factor- free. Then L is bifix-free, and Tl C Bbf(»T.) by 
Proposition |9l For any transformation t^ G Tl performed by some non-empty 
word w, if IPjjj — n — 1 for some j ^ 1, then w'-' £ i. If we also have itl^ ^ n for 
some i G Q \ {!}, then i ^ {n — l,n} as {n — l)t — nt = n for all t G Bff (n). Thus 
there exist non-empty words u and v such that state i is reachable by u, and 
state i{t{^) accepts v. So uw^v 6 L, which is a contradiction. Hence Tl C Bff (n). 

2. Since e is the only accepting state and Bff(n) C Bbf(7T.), L is bifix-free by 
Proposition [9l If L is not factor- free, then there exist non-empty words u,v and 
w such that w,uwv G L. Thus It^ = n — 1, and It.^.^^, = llt^t^ty) — n ~ 1. 
Since L is bifix-free, lt„ ^ 1 and nty — n; thus {lty)ti^ ^ n, which contradicts 
the assumption that tw & Tl <Z Bff(n). Therefore L is bifix-free. □ 

The properties of suffix- and bifix-free regular languages still apply to factor- 
free regular languages. Moreover, we have 

Lemma 5. For all t G Bff(n) and i ^ St{l), i/n — 1 G Sf(l), then n G Sf(i). 

Proof. Suppose n — 1 = It^ G St(l) for some fc ^ 1. If n ^ St(*), then for all 
j ^ 1, iP 7^ n. In particular, it'' ^ n, which contradicts the definition of Bff (n). 
Therefore n G St{i). □ 

Lemma 6. For n ^ we have |Bff(n)| = Nn + On, where 



n-2 k 

o„ = 1 + ^ c-:^{k 1)! c--':-lx+iik) H ^'rM^ ~ i)' 

i=2 



k=2 



r2-\ hffc+r 

=n-fc-2 



and Nn as given in Equation (0). 



Proof. Let t e Bff(n) be any transformation. Suppose St{l) — 1, It, ... , lt^,n, 
where ^ fc ^ n — 2. Then there are two cases: 

1. n — 1 £ St(l). Since {n — l)t = n, we have n — 1 = It'', and fc ^ 1. If fc = 1, 
then It = n — 1, and it = n for ah i 7^ 1; such a t is unique. Consider k ^ 2. 
There are Cfcli (fc - 1)! different St(l). For 2 j «C fc, suppose there are 
rj + 1 nodes in tree Ti(j); then there are S'J.__^_l{j — 1) such trees. Let E 
be the set of elements x that are not in any tree Tt{j) nor in St(l), and let 

r = \E\ = {n — k - 2) — {r2 -\ h r/c). By Lemma[5l n e St{x) for ah x € E. 

Then the union of paths Ptix) for all a; £ i? form a labeled tree rooted at n 
with height at most fc, and there are S'^j^i{k) such trees. Thus the number 
of transformations in this case is 0„. 

2. n-l ^ St{l). Now, for ah j > 1, W ^ n - I. Then t G Bbf(n). As in the 
proof of Lemma 21 the number of transformations in this case is N^- 

Altogether we have the desired formula. □ 

Let bfi (n) = |Bff(n)|. From Proposition 1151 and Lemma [6] we have 

Proposition 16. For n ^3, if L is a factor-free regular language with quotient 
complexity n, then its syntactic complexity <y{L) satisfies that a{L) ^ bff(n), 
where bff(n) is the cardinality o/Bff(n) as in Lemma\^ 

The tight upper bound on the syntactic complexity of factor-free regular 
languages is reached by the largest semigroup contained in Bff(n). When 2 ^ 
n ^ 4, Bff (n) is a semigroup. The languages L2 — e, — a over alphabet {a, b}, 
and L4 = ab*a have syntactic complexities 1 = bff(2), 2 — bff(3), and 6 = bff(4), 
respectively. So bff(ri) is a tight upper bound for n G {2,3,4}. However, the 
set Bff (n) is not a semigroup for n ^ 5, because si = [2, 3, . . . , n — 1, n, n], S2 = 
{"n^){„-i)ii) = [n,n-l,3, . . . ,n-2,n,n] G Bff(n) but S1S2 = [n- 1, 3, . . . , n- 
2, n, n, n] ^ BfT(n). 

Next, we find a large semigroup that can be the syntactic semigroup of a 
factor-free regular language. 

Let to = (^\i^>)(„ij = and let Wff (n) = U,i U {to} U . 

When 2 ^ n ^ 4, we have Wff (n) — Bff (n). So we are interested in larger values 
of n in the rest of this section. 

Proposition 17. For n ^ 5, Wff(n) is a semigroup contained in Bff(n) with 
cardinality 

wff(n) - |Wff(n)| = {n- ^ _ 3)2n-3 + ^ 

Proof. As we have shown in the proof of Proposition 1131 U,;'^ is a semigroup. 
For any t G U {to}, since to G U^, we have tto,tot G U,;',; so U {to} is 
also a semigroup. We also know that, for any ta G and t' G Wff(n), since 
Ws{n) C W^f'^(n), iitst') = n for ah i ^ 1; so t^t' G Wff (n). If t' G U,i U {to}, 
then It'ta = n and t'tz G U^; otherwise, t' G U^, and t'ta = t2 or (^) G U^. 
Hence Wff (n) is a semigroup. 



For any t e U^, since It ~ n, we have t G Bg (n). For any t e 11,^, It ^ n — 1, 
and it^ — n for all i g {2, . . . , n}; then t G Bff(n) as well. Clearly to G Bff (n). 
Hence Wff(n) is contained in Bff(n). 

We know that \Ul\ = (n - l)"-^ and \Ul\ = (n - 3)2"-3_ Therefore 
|Wff(n)| = {n- + _ 3)2"-3 + i. □ 

We now describe a generating set of Wff(n). 

Proposition 18. For n ^ 5, the semigroup Wff(n) is generated by Gff(n) = 
{01,02,03,61, . . . ,fe„_3,ci, . . . ,Crn}, where m ^ {n ~ 3)(2"~^ - 1), and 

1- «i = Qr,T^)(2,-..,"-2). a, = Qr,;^)(2,3), 03 = C)r,;^)(V); 

2. Forl^^^n^i, h = OrJ)Crt\); 

3. Each Ci defines a distinct transformation in other than [j, n, . . . ,n,n] for 
all j G {2,...,n-2}. 

Proof. We know from the proof of Proposition [T3] that is generated by 
{ai, 02, 03, 61, . . . , &„_3}. Also, the transformations that are in {to} U but 
not in Gff(n) are tj — [j, n, . . . ,n,n], where j G {2, . . . , ri — 1}. Each tj is a com- 

position of d = Q G G^,'{n) and = (i) ("5) Q G . Therefore 

(Gff(n)) = Wff(n). □ 

Theorem 10. For n ^ 5, let An = (Q, S,S,l,F) be the DFA with alphabet 
E — {oi, 02, 03, 61, ... , fe„_3, ci, . . . , Cm} of size (n — 3)2"~'^ + 3, where each letter 
defines a transformation as in Provosition \18l and F — {n—1}. Then L = L(An) 
has quotient complexity k{L) — n, and syntactic complexity (t{L) = Wff(n). 
Moreover, L is factor-free. 

Proof. Since Gff(n) C G^f^(n), the DFA An can be obtained from the DFA A' 
of Theorem ini by restricting the alphabet. The words used to show that all the 
states of A' are reachable and distinct still exist in An. Then we have k{L) = n. 
By Proposition [TSl the syntactic semigroup of L is Wfi(n); so a-{L) = Wff(n). 
By Proposition [T51 L is factor-free. □ 

Conjecture 3 (Factor-Free Regular Languages). If L is a factor-free regular lan- 
guage with — n, where n ^ 5, then (t{L) ^ Wff(n) and this is a tight 
upper bound. 

We prove the conjecture for n — 5 and 6. 

Proof. For n = 5, |Bff(5)| — 31, and |Wff(5)| — 25. There are 6 transformations 
Ti, . . . , Tg in Bff (5)\Wff (5). For each Ti, 1 ^ z < 6, we found a unique ti G Wft(5) 
such that {t„T,} % Bff (5): 



n = 


[2,3,4,5,5], 


h = 


[5,2,2,5,5], 




[2,3,5,5,5], 


t2 = 


[5,4,2,5,5], 


T3 = 


[2,5,3,5,5], 


^3 = 


[5,3,3,5,5], 


T4 = 


[3,2,5,5,5], 


ti = 


[5,2,4,5,5], 


T5 = 


[3,4,2,5,5], 


^5 = 


[5,3,2,5,5], 


T6 = 


[3,5,2,5,5], 


te = 


[5,3,4,5,5]. 



For each 1 ^ i ^ 6, at most one of ti and can appear in the syntactic 
semigroup of a factor-free regular language L. Then a{L) = \Tl\ ^ 25. By 
Theorem 1101 this upper bound is tight for n ~ 5. 

For n = 6, |Bff(6)| = 246, and |Wfr(6)| = 150. There are 96 transformations 
ri,...,T96 in Bff(6) \ Wff(6). For each r^, 1 ^ i < 72, we enumerated the 
transformations in Wff (6) using GAP and found a unique ti e Wff (6) such that 
{ti,Ti) 2 Bff(6). Thus 150 is a tight upper bound for n = 6. □ 

8 Quotient Complexity of the Reversal of Free Languages 

It has been shown in [3] that for certain regular languages with maximal syntactic 
complexity, the reverse languages have maximal quotient complexity. This is also 
true for some free languages, as we now show. 

In this section we consider non- deterministic finite automata (NFA). A NFA 
A/" is a quintuple J\f = {Q,E,5,I,F), where Q, U, and F are as in a DFA, 
(5 : Q X Z" ^ 2^ is the non-deterministic transition function, and / is the set 
of initial states. For any word w G S*, the reverse of w is defined inductively 
as follows: = £ if w = e, and = u^a \i w = au for some a (£ E and 
u G S* . The reverse of any language L is the language — {w^ \ w L}. For 
any finite automaton (DFA or NFA) A4, we denote using the automaton 
obtained by reversing Ai and exchanging the roles of initial states and accepting 
states, and Ai^ , the DFA obtained by applying the subset construction to M. 
Then L{M^) = {L{M))^, and L{M^) = L{M). To simplify our proofs, we use 
an observation from [5] that, for any NFA N whose states are all reachable, if 
the automaton Af^ is deterministic, then the DFA Af^ is minimal. 

Theorem 11. The reverse of the prefix- free regular language accepted by the 
DFA An of Theorem\^ restricted to {a,c,dn-2} has 2"^^ + 1 quotients, which is 
the maximum possible for a prefix-free regular language. 

Proof. Let S„ be the DFA An restricted to {a, c, d„_2}- Since L{An) is prefix- 
free, so is Ln = L{Bn). We show that k(L^) = 2"-^ _,_ i. 

Let Nn be the NFA obtained by removing unreachable states from the NFA 
A^. (See Fig.mfor Mq.) We first prove that the following 2""^ + 1 sets of states 
of Mn are reachable: {{n - 1}} U | S* C {1, . . . , n - 2} }. 



Fig. 4. NFA Ma of with quotient complexity K.(La^) = 17; empty state omitted. 




a, c 

c, C?4 c, d4 



The singleton set {n — 1} of initial states of A/'„ is reached by e. From {n — 1} 
we reach the empty set by a. The set {n — 2} is reached by (i„_2 from {n — 1}, and 
from here, {1} is reached by a"~^. From any set {1,2,..., i}, where 1 ^ i < n — 2, 
we reach {1, 2, . . . , j, i + 1} by ca"~^. Thus we reach {1, 2, . . . , n — 2} from {1} by 
{ca"~^)"~^ . Now assume that any set S of cardinality I ^ n — 2 can be reached; 
then we can get a set of cardinality / — 1 by deleting an element j from 5* by 
applying dn-20-"~^~^ ■ Hence all the subsets of {1, 2, . . . , n — 2} can be reached. 

The automaton is a subset of An, and it is deterministic. Then Af^ is 
minimal. Hence n{L^) — 2"^^ + 1, which is the maximal quotient complexity of 
reversal of prefix- free languages as shown in [10 . □ 

It is interesting that, for suffix-, bifix-, and factor-free regular languages, al- 
though we don't have tight upper bounds on their syntactic complexities, some 
languages in these classes with large syntactic complexities have their reverse 
languages reaching the upper bounds on the quotient complexities for the rever- 
sal operation. 

Theorem 12. The reverse of the suffix- free regular language accepted by the 
DFA A'^ of Theorem\S\ restricted to {01,02,03,0} has 2"^^ + 1 quotients, which 
is the maximum possible for a suffix-free regular language. 

Proof. Let C„ be the DFA A'^ restricted to the alphabet {01,02,03,0}. Since 
L{A',^) is suffix-free, so is L'^ = L(C„). Let Ml, be the NFA obtained from by 
removing unreachable states. Figure [S] shows the NFA A/'g. 



ai, as 




Fig. 5. NFA A/fi of Lg^ with quotient complexity K{L'ff') = 17; empty state omitted. 

Apply the subset construction to A/"^, we get a DFA JV^f . Its initial state 
is a singleton set {2}. From the initial state, we can reach state {2,3,..., i} by 
(030""^)'"^, where 3 ^ i ^ n — 1. Then the state {2, 3, . . . , n — 1} is reached from 
{2} by (030""^)""^. Assume that any set S of cardinality / can be reached, where 
2 s^l i^n-2.Mj e S, then we can reach S' ^ S\ {j} from S by aj'^OsOi 
So all the nonempty subsets of {2, 3, . . . , n — 1} can be reached. We can also 
reach the singleton set {1} from {2} by c, and, from there, the empty state by 
c again. Hence TV^^ has 2"^^ + 1 reachable states. 



Since the automaton N^, the reverse of A/"^, is a subset of C„, it is determin- 
istic; hence N!^ is minimal. Then the quotient complexity of L'^^ is 2""^ + 1, 
which meets the upper bound for reversal of suffix- free regular languages 9 . □ 

Theorem 13. The reverse of the factor-free regular language accepted by the 
DFA An of Theorem ll 0\ restricted to the alphabet {ai, 02, 03, c}, where c — [2, n — 
l,n, . . . ,n,n] G Gff(n), has 2"^^ + 2 quotients, which is the maximum possible 
for a bifix- or factor-free regular language. 

Proof. Let P„ be the DFA An restricted to the alphabet {oi, a2, as, c}; then 
L" = L{Vn) is factor-free. Let TV" be the NFA obtained from by removing 
unreachable states. An example of A/"" is shown in Figure IHl 



ai, as 




t 

Fig. 6. NFA A/"" of Ly^ with quotient complexity n{L"^) = 18; empty state omitted. 

Note that A/"" can be obtained from the NFA Afn-i in Theorem [T2l by adding 
a new state n — 1, which is the only initial state in A/"^', and the transition 
from {n — 1} to {2} under input c. We know that all non-empty subsets of 
{2,3,.. .,71 — 2} are reachable from {2}. The accepting state {1} is also reachable 
from {2}. From the initial state n — 1, we reach the empty state under input ai. 
Then ^f;|'^ has 2'^'^ -\- 2 reachable states. 

Since Af^^ is a subset of 2?„ and it is deterministic, the DFA TV//'^ is minimal. 
Therefore = 2"-^ + 2, and it reaches the upper bound for reversal of 

both bifix- and factor- free regular languages with quotient complexity n \4i. □ 

9 Conclusions 

Our results are summarized in Tables [2] and [3l Each cell of Table [2] shows the 
syntactic complexity bounds of prefix- and suffix-free regular languages, in that 
order, with a particular alphabet size. Table |3] is structured similarly for bifix- 
and factor-free regular languages. The figures in bold type are tight bounds veri- 
fied by GAP. To compute the bounds for suffix-, bifix-, and factor-free languages. 



we enumerated semigroups generated by elements of Bgf (n), Bbf (n), and Bff(n) 
that are contained in Bgf (n), Bbf (n), and Bff (n), respectively, and recorded the 
largest ones. By Propositions El [51 [TSl we obtained the desired bounds from the 
enumeration. The asterisk * indicates that the bound is already tight for a smaller 
alphabet. In Tabled the last four rows include the tight upper bound n"^^ for 
prefix- free languages, w^^(n), which is a tight upper bound for 2 ^ n ^ 5 for 
sufRx-free languages, conjectured upper bound w^^(n) for suffix- free languages, 
and a weaker upper bound bsf(n) for suffix- free languages. In Table [31 the last 
four rows include w^/(n), which is a tight upper bound for bifix-free languages 
for 2 ^ n ^ 5, conjectured upper bounds w^f^(n) for bifix-free languages and 
Wff(n) for factor- free languages, and weaker upper bounds bbf(n) for bifix-free 
languages and bff (n) for factor- free languages. 

Table 2. Syntactic complexities of prefix- and suffix-free regular languages. 





n = 2 


n = 3 


n = 4 


n — 5 


71 = 6 


\S\ = 1 


1 


2 


3 


4 


5 


\S\ = 2 


* 


3/3 


11/11 


49/49 


? 


\E\ = 3 


* 


* 


14/13 


95/61 


7 


\S\ = 4 


* 


* 


16/* 


110/67 


7 


\E\ = 5 


* 


* 


* 


119/73 


7 


\E\ = 6 


* 


* 


* 


125/ * 


? /501 


\E\ = 7 


* 


* 


* 


* 


L296/ ? 


\E\ = 8 


* 


* 


* 


* 


* /629 
















1 


3 


16 


125 


L296 




1 


3 


13 


73 


501 


w- (n) 


1 


3 


11 


67 


629 


bsf (n) 


1 


3 


15 


115 


1169 



Table 3. Syntactic complexities of bifix- and factor-free regular languages. 





n = 2 


n = 3 


n = 4 


n = 5 


n — 6 


|ri = i 


1 


2 


3 


4 


5 


|r| = 2 


* 


* 


7/6 


20/12 


7 


|r| = 3 


* 


* 


* 


31/16 


7 


1^1 = 4 


* 


* 


* 


32/19 


7 


|r| = 5 


* 


* 


* 


33/20 


7 




* 


* 


* 


34/ ? 


7 














^ 

Wbf ("■) 


1 


2 


7 


34 


209 


wQ {n) 


1 


2 


7 


33 


213 


Wff (n) 


1 


2 


6 


25 


150 


bbf(n)/bff(n) 


1/1 


2/2 


7/6 


41/31 


339/246 
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